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1. Introduction 

The Nucleic Acid Database (NDB) [1] was estab- 
lished in 1991 as a resource for specialists in the field of 
nucleic acid structure. Its purpose was to gather all the 
structural information about oligonucleotides that had 
been obtained from x-ray crystallographic experiments 
and to organize them in such a way that it would be ea^ 
to retrieve the coordinates, the information about the 
experimental conditions used to derive these coordi- 
nates, and the structural information that could be 
derived from these coordinates. It was clear from the 
beginning that many of the users of these data would not 
themselves be crystallographers, and that the informa- 
tion provided by the database had to be presented in 
such a way as to maximize its utility for various types of 
modeling and structure prediction. 

As the project progressed, many new technologies 
developed that presented challenges and opportunities. 
These include the development of the standard inter- 



Certain commercial equipment, instruments, or materials are identi- 
fied in this paper to foster understanding. Such identification does not 
imply recommendation or endorsement by the National Institute of 
Standards and Technology, nor does it imply that the materials or 
equipment identified are necessarily the best available for the purpose. 



change format for handling crystallographic data, called 
the Macromolecular Crystallographic Information File 
(mmCIF), and the explosive use of the World Wide Web 
(WWW). 



2. Database Contents 

Structures available in the NDB include RNA and 
DNA oligonucleotides with two or more bases. These 
oligonucleotides may be complexed with drugs and ions. 
Structures of larger nucleic acid containing crystals, 
including protein-DNA and protein-RNA structures, are 
also curated and included in the archive. Table 1 shows 
the current holdings of the NDB. 

Current literature is scanned on a regular basis, and 
structures suitable for inclusion in the NDB are noted. 
Coordinates sets are retrieved from the Protein Data 
Bank (PDB) [2] and are then filtered into the NDB 
format. In the case of oligonucleotides not complexed 
with proteins, coordinate sets submitted by the author 
for submission into the PDB are processed. Starting in 
January 1996, the NDB became a direct deposition site 
for these oligonucleotide structures. 
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In addition to coordinate data, information relevant to 
the crystallographic experiment is abstracted from the 
primary literature for inclusion into the database. These 
include crystallization conditions, refinement statistics 
and data collection statistics. Other derived information, 
such as the distances, angles, tcrsion angles, and base 
morphology parameters, is calculated from the coordi- 
nate data and placed in the database. Tables 2a and 2b 
list summaries of the information currently in the NDB. 

Table 1. NDB holdings as of October 1995 
408 structures (390 released) 

Structure Type Number 



A-DNA 

DNA/RNA Hybrid 

A-RNA 

DNA-t)rug Complexes 

n-DNA 

RNA-Drug Complexes 

ZONA 

IRNA 

Unusual DNA 

lYotcin-NucIeic Acid Complexes 

Unusual RNA 



51 
11 
10 
93 
66 
19 
47 
10 
21 
66 
14 



Table 2a. Primary experimental information stored in the NDB 
— Continued 



Crystallization description" 



Refinement information" 



Coordinate information'' 



Method 

Temperature 

pH value 

Composition of solutions 

Method 

Program 

Number of reflections used for re- 
finement 

Data cutoff 

Resolution range 

R-factor 

Refinement of temperature factors 
and occupancies 

Atomic coordinates, occupancies 
and temperature factors for 
asymmetric unit 

Coordinates for symmetry related 
strands 

Symmetry related coordinates in 
unit cell (packing) 

Orthogonal or fractional coordi- 
nates 



Table 2a. Primary experimental information stored in the NDB 



"Kible 2b. E>erivative information stored in the NDB 



Structure summary' 



Structural description' 



Citation' 



Crystal Aita' 



Data cdlection dcscriplinn' 



Descriptor 

NDB, PDB, and CSD names 
Coordinates available (yes/no) 
Modifiers (yes/no) 
Mismatches (yes/no) 
Dmgs (yes/no) 

Sequence 

Structure type (A/B/Z^RH/U/P) 

Description of modifiers of base, 

phosphate, and sugar 
Description of base mismatch 
Name and binding type of drug 
Description of base pairing 
[X-scripllon of contents of asym- 
metric unit 

Authors 

Title 

Joumal 

Volume 

Pages 

Year 

Cell dimensions 
Space group 

Scurce of radiation 
D:iia collection device 
Radiation wa\?length 
Temperature 
Resolution range 
Total and unique number of re- 
flections 



Distances" 



Torsions' 



Angles" 



Base morphology" 



Chemical bond lengths 

Virtual bonds (involving 

phosphorus atoms) 

Backbone and side chain torsion 

angles 
Pseudorotational parameters 

Valence bond angles 
Virtual angles (involving phos- 
phorus atoms) 

Parameters calculated by different 
algorithms 



" Reports can be generated in either ASCII or LATEX. 

'' Reports can be generated as an NDB or PDB coordinate file, a 

Kinemage template, or as PostScript molecular graphics. 
' Parameters can be displayed in both LATEX or ASCII tables, or as 

a PostScript conformation wheel. 



3. Data Processing 

3.1 Data Entry and Integrity Checks 

The scheme for data processing is given in Figs. 1 a 
and lb. A set of filter programs have been developed 
that allow this process of data entry and integrity check- 
ing to be highly automated. A key feature of the system 
is the use of a template based on mmClF. A template 
is a CIF data file that includes definition and example 
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Fig. la. A schematic diagram of NDB data processing that illustrates 
the central role of the NDB filter software in automating the exchange 
of information between a variety of input formats and the mmCIF 
template and data file archival format. 



Derived Data 

Bond Lengths 

Bond Angles 

Torsion Angles 

Virtual Bonds and Angles 



Encapsulated Programs 

Newhalls 

Dials & Windows 

Babcoch 

Tung 

NUPARM 



VDatabasey 



NDBQUERY 



X 



WWW 

NDBQUERY 



Graphical Reports 

20 Plots 
Scatter Plots 

HIstograma 
Molecular Gr^hlcs 
Packing Olagrams 
Conformation Wheels 
Klnamage Files 
XGoU Input Files 

Tabular Reports 

ASCII tables 
Postscript tables 



{'Coordinates 

Asymmatiic Unit 
Biological Unit 
Packing coordinates 
(popular nie formats) 

Structural Atlas 

A-DNA 
B-DNA 

Protain /Nudeic Acid 
Complexes 



information from the mmCIF dictionary which serve as 
comments preceding each data category. The CIF 
template is a skeleton file that is easily used with a text 
editor. The NDB has created software tools to populate 
the CIF template with data from a variety of file 
formats. Items which cannot be loaded electronically 
are flagged for later manual entry. This method allows 
the NDB to work with a large variety of formats. For 
example, all items that are fully parseable from the PDB 
can be loaded into a template. The rest of the informa- 
tion, provided in the manuscripts or in the text parts of 
the PDB file, can be entered by a data curator. Files in 
completely different formats can be handled by reorder- 
ing the mmCIF tokens. After the template is completed, 
new items that can be derived from the coordinate sets, 
such as the DNA sequence, are added using the NDB- 
filter programs. Checks are built into the filter programs 
that ensure that the coordinates have standard ordering, 
and that the nomenclatures of both the polymers and the 
ligands are consistent. In addition, programs have been 
written that allow many of the data items to be automat- 
ically extracted from the commonly used refinement 
programs for nucleic acids. The use of these filter 
programs permits the data processing procedure, 
including checking, to be completed in about 3 h per 
structure. The rate limiting step is the gathering of 
missing data items that are not included in any 
of the standard computer files used as input into data 
processing. 

The result of these processes is a flat file in the NDB 
format which is ready to be loaded into the database. 

3.2 Database Management 

Once the first level of checks have been made on the 
data, they are entered into a relational database using 
SYBASE as the database management system. Over 60 
tables are created in the original raw data. A 
simple menu driven program, NDBqucry allows the user 
to interact with the database using a natural language 
rather than SQL. The same program manages the 
calculation of derived quantities, including distances, 
angles, torsion angles and base morphology parameters, 
for each structure that are then loaded into the database. 



Network Server 

WWW/Gopher/FTP 



Fig. lb. Schematic view of the data flow in and out of the NDB 
database as of October 1995. The figure illustrates the generation of 
derived structural features by the NDBquery program using both 
internal functions and encapsulated external prcgrams. The collection 
of report types created by NDBquery is also shown. All of these 
reports are accessible via the NDB network server. 



4. Information Retrieval 
4.1 Constraint Generation 

The NDB uses a two phase system to query the data- 
base. In the first pass, the structural features that are to 
be considered are selected. Any of the data items stored 
in the database can serve as a selection constraint. For 
example, it is possible to select structures of a particular 
type which have torsion angles in a particular range and 
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which have been determined by a particular author. 
Two examples of the use of structure selection con- 
straints are presented in Tables 3a and 3b. 

It is possible to use either the menu driven interface 
to NDBquery or the WWW forms based system to gen- 
erate selection constraints. The advantage of the latter 
method is that it places no restrictions on the user other 
than the ability to use the World Wide Web using either 
Netscape or Mosaic. A sample query using the WWW 
access is shown in Fig. 2. 

4.2 Report Generation 

Once the selection constraints are defined, a large 
variety of reports can be generated that describe any of 
the properties that are stored in the database. The sim- 
plest type of report is the list of coordinates for the 
selected structures. In addition, the NDBquery program 
produces reports in a wide variety of formats. Tabular 
reports such as those shown in Fig. 3 can be produced 
in cither ASCII or PostScript formats. 

Graphical reports relating any two properties can be 
generated. It is possible to produce scatter charts, his- 
tograms, and pie charts that can be used to analyze the 
properties of the structures contained within the data- 
base. These report features were used to examine the 
frequency distributions as well as the correlations of 



torsion angles of the three classes of DNA duplexes. 
In order to automate this type of survey, batch query 
capabilities were built into the system. Examples of 
graphical outputs are shown in Fig. 4. 

The ^fDBquery program also produces molecular 
graphics in a variety of formats. Structures can be 
depicted using color codes for the properties of the 
atoms or residues. Automatic packing pictures are 
generated in PostScript format using NDBquery and in 
raster form using NDBview [3]. Various types of repre- 
sentations, including ball and stick and Van der Waals 
spheres, are available (Fig. 5). 

There are provisions for detailed formatting so that a 
complete set of publication quality reports for a set of 
structures can be produced. To simplify the query 
process, some standard and commonly used queries are 
saved and made available for the user. In addition, the 
user may save her own queries to be used repeatedly for 
a particular project. 

The WWW forms based interface also allows for 
report generation. Coordinates may be retrieved in 
mmCIF, NDB or PDB format. It is also possible to 
retrieve an Atlas page (see later) and to view the struc- 
ture using a dynamic viewer. The latest version of the 
WWW Interface can also create tabular reports based 
on any of the features contained in the database. 



TableAj. Exampkl: Struclure selection ofB-DNAsconlaining the residue sequence "C G C G " without base modifiers, mismalches, or drugs 



TiWe 


Properly 


Operator 


Operand 


Logical 


structuraLinfortnation 


struclurejype 


.. 


B 


AND 


slruclural_infonnation 


Sequence_of_Strand_A 


like 


%CGCG% 


AND 


struclure_5ummary 


base_modifier 


is null 




AND 


structure_summary 


mismatch 


is null 




AND 


structurc_5ummary 


drug 


is null 




AND 



TaWe Jb. Example 2: Structure selection of B-DNAs with resolution <1.9A and R factors <0.17 by authors A. Rich, R. E. Dickerson, or 
O. Kcra\.irJ 



Table 


Properly 


Operator 


Operand 


Logical 


structuraljnformaiion 


structurejype 


_ 


B 


AND 


r_fx-tor 


Up_Lim_Rcsol_Ref 


s 


1.9 


AND 


r_factor 


R_Value 


< 


0.17 


AND 


citation 


authors 


like 


R. E. Dickerson 


OR 


structuraljnformaiion 


structurejype 


- 


B 


AND 


r. factor 


Up_Lim_Resol_Ref 


s 


1.9 


AND 


r_factcr 


R_Value 


< 


0.17 


AND 


citation 


authors 


like 


A. Rich 


OR 


siructural.informalioo 


structurejype 


■. 


B 


AND 


r_ factor 


Up.Um_Resol_Ref 


s 


1.9 


AND 


r_f.Ktor 


R_Value 


< 


0.17 


AND 


citation 


authon 


like 


O. Kennard 
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NDB STRUCTURE SELECTION MENU 

Thii *ppliciikMi interfKC h»t been designed Eo help effidmtly tdcci KructuKi 
from the Nudek Aud DiUImk (NDB). The NDB ii ■ rdiaoN*! d*UlMK whtdi 
conUini t*Un al both npaimentil Mid itTiictiiral dait. You u*]! uk tht 
»ivigitkin*1 tooli prorided by thii interface to ttiax nnicturei tiom the d*ub»K 
with ■ puikuttf Kt fttvant. A vuiecj tt output optioni are ivultUe for etch 
■decied nructuTc 



Options 

Table Selection Menu 

BrowK the contenu «f the N DB Md Klect iniemting 

d*U itemi ]«a widi Id include m jout query. 

Reset Query 
[Mete ill <rf the euocnt qner? conditkmt md be^ » new query. 

F.xecute Query 
Proceu the currenl query utd ditplty tht Rrwclure Klection. 

ExiL 

Retani to ih< NDB Home Page 









TABLE SELECTION MENU 



GENERAL INFORMATION 



n&QLGencnt InfotaitlJoB uid dcKriptka tboul e*di 
•tmcture 

• JUlaUlULBibliosraphk infomi*tia«. Trimnj *nd tecondtiy refermcei 

• status_infortnation l»fanii«iM mui fbt ill Mracnirei in NDB. 
PDB.>i»dCSD 



STRUCTURE DESCRIPTION 

. cell dimensions r.riidim«.iinni.ip»cef roup 

■ -Struct ural Jnformatiot)! Re«|ue vquenc^ in(onn*tkM on tfrwrtnre 

typ* 

• _baSS..modifiCC.DeMipcion ol hHC modinert 

■ phosph a tf_Tn pdifiyr De»crip<ioB ol piKHphtte audinen 

■ SUgflr_niOdlficr DeicriptHni t/ nigu modifier! 

• JiybriiDeKnption ol DNA-RNA hytaidi 
- JIllsmflldLDe»m]>tioito/raiwn»iche« 










COLUMN SELECTION MENU 

Select OMC of Dm cohunnt diipb]>ed below lo piKC ■ query coaditMN on tMi Hi 
I MiUB-Mcnu I IMa I toAQsaa i Lnulc j[iutu ' 

Columns in Table structural information 



_ Ll-MIih o' the ftrR Mruid ((^ 3, 3, H CK ] 

• _Seqyence.Ofi^triind_A^.Sequenceof RMaei In the A' Mrmd (»« 
COCO CO) 

• Sequenrg ^ofStrand^B Sequence of readun m ihe 'B' anttd (ci C 
OCGCO) 

• Segue nce_Of_Strand_C Sequence of reMae* in (he 'C Mrtad (t^ C 
OCOCO) 

■ Sequence^ of _Strand J Sequence of readuti in the 'D* Mr*ad (ix 
COCOCG) 

■ JitUIClUrtJlJFpe.Trpeof Mntctoie (e« \ B. Z, t (- Umiwil). RH(- 
Ri^ HuKtcdX P (■ Protdn/DNA Camp)et()) 

• NQ_Strands Single at double hdix (*«. DOUBLE {mo* oha,\ 
SINGLE) 

• ^SpeciaLRemaik. Other dcKr^ptiDn tboM itructure lypc ((« CYCLIC, 
LOOPOLT) 

• .M0jJt^trMldJLr-«Jc^S3fni_LjQll.Nu»bet o* M*ndi in the 
••ynunetrk mil (tx ^ 3) 

• Si2e_Of_Chflin_j'er _ Asyni^VnH Whit partkn of th«niii in 

tiyitunetnc omi (tx HtU. qutrter. etc ) 



y* ?' >"*' 4 55!StSj$^'^'!'^^'j^^^y?^ !'^^ S!^i .!!!!?! 3j;a'i 



QUERY CONDITION MENU 

I'm the fblloving farm to pl*ce * canditkM on the Hected item. Select the 
•nropriiie lo|k:i] *nd compwiaM opertton luing the pull-dovn menut, utd cnier 
Uk dedred conMiining «ilue b the text box. 



Sequence j>f_Strand_A 

Sequence of TCaduei in the 'A' MT md 

PiKe * connrunini vahie Iw thii itein in )p*ce below ( e# C G C G CG ) 
Vilue: ^"J**^ ■. 

Seleci the comparimi opentoc foi ihi» condition from the MIoving atemi 
ComptriMn Opertior ; Uki Jiaf \ii\ 



Lopcal Operator: :AMO ii 

ilCMtiiani :'RMMi CiMitiM} 



^Oqi»fti«gw^ 



Current Query 

No query conditioni hne been Klected _ 



COLUMN SELECTION MENU 



Select one of the cohimni di^tyed bdow lo pttc 



7 conditian on thii item 



I M »bi M wa I T*]B« l BMaQuea i EjMUltQiwi i 



Columns in Table structural_inforraation 



LLei^ih of the firK NrMid (e^ %\IX etc.) 
_ _ \_Seqtienceof re*lBe»aithe'A"«r»Bd(»x 

COCOCG) 

■ SequenceofStrand B Seai.ewce of readuei in the "B" ttrtwd («x C 
CCOCO) 

■ Sequence_of_Strand_C Sequence of reMae* in the "C wind («# C 
G C G C G) 

■ Sequenqe_Of_Strand_D Sequence of rcMduet in the *D' MTMd (fx 
COCOCG) 

- Structure Ty pe Type of itructMre (tj. A, B. Z, U (• Uuuwil), RH (- 
Right lltndedX P (- Proirin/DNA CoMpi™)) 

• No^trWldSSingleofdouMehdixte* DOUBLE (-oftoftenl 
SINGLE) 

- .SpKiaJJBfmMjLOlhci deicrifilion »bo« arueture type (t* CYCLIC, 
LOOPOUT) 

.. . _ '* -il.NuBAer«fw»nd»iiith» 



> i yni w e u ic unit (e:^ 1. Z) 
. Size Of Chain Per Aivni Un iLWh.t portion of chiinii in 
■rymiDetnc unit (i^ Half. qu*rter. eK-> 



NDB STRUCTURE SELECTION 

I Miin Me;i>l I T»Wr« I RcM QliOj I 
SH<n Mw cf dw Mkiwin} N D R 



Di^ltToptiMi: i Witaw Ci 1 1 iliwlii (MM hn^ ^\ 



Yoai<li|ri»]td(vlctli |j«P«p«),v ««V9«n VArfr^ * 

For dmunicrennttttllriirrainifiicHlotlTr; Idle* .HdbMnmratgcnc*. 



iaii^iittofc.i \*^ i iL H i-.\ 



V AcM DkAh. tittra 



SM , g«yw*&y» " 



Slit 6<MW|*;&)W- ' ' .'... 



—ff)i i- ^ -* ^ ' "i f"rHmw 



Fig. 2. Sequence for a simple query, i.e., choosing structures that contain the specific sequence ACGCG using the WWW Interfxe. version 2.0 
(October 1995). 



Beginning from the upper left: 

a. The Tible Selection Menu from the NDB Structure Selection 
Menu is chosen. 

b. The Structuraljnforniation menu is selected from the Tbble 
Selection Menu. 

c. Sequence_of_Strand_A is selected from the Cdumn Selecticm 
Menu. 

d. The desired sequence, A C G C G, is entered in capital letters with 

spaces separating each residue in the provided field. To move to 
the next step, the Continue bar, is selected. 

e. Once all of the desired constraints are selected. Execute Query 
is pressed from the top of the Column Selection Menu. 



A list of the NDB identifiers of the structures containing the 
sequence ACGCG is presented. The user may now: 

Retrieve coordinates in NDB Format 

Retrieve coordinates and the bibliographic informatbn in NDB 
Format (Full Entry) 

Retrieve coordinates in PDB Formal 

Display the structure using a remote viewer (launching RasMol 
viewer on ndbserver) 

Display the structure using a local viewer (launching your own 
viewer) 

Display the Atlas Entry for the structure 
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Citntious for Structures With Coordiuates by A.utlior A..H.-J. Waun 
Containing the Sequeuce CGCGCG 



NDD ID 



Citation 



DDF023 



ZDFOOl 



ZDF002 



ZDF028 



ZDFB03 



A.H.-J.Wing, Y.-G.Gio. y.-C.Liiw, Y.-K.Li 

Fotmaldeliyde Cioss-Linlu Dannonibidn and DNA Efficiently: HPLC and X-Ray DiSrcLCtion Studies 

Biochemiitry, 30, 3812-3815, 1991. 

A.H.-J.Wans, G.J.Quigley, F.J.KoIpak, J.L.Ciawfoid. J.H.Van Boom, G.A.Van Dec Marel, A.Rich 

Moleculai .Stinctuie of a Left-Hasded Double Helical DNA Fragment at Atomic Resolntion 

(Voturf. 282, 680-68S, 1979. 

R.V.GeMner, C.A.Frederick, G.J.Quigley, A.Rich, A.H.-J.Wang 

The Molecular Structure of the Left-Handed Z-DNA Double Helix at 1.0 Angstrom Atomic Resolution. Geometry, 

Conformation, and Ionic Interactions of d(CGCGCG) 

J.Biol.Chem., 264. 7921-7935, 1989. 

T.F.Kagawa, B.H.Geierstanger, A.H.-J.Wang, P.S.Ho 

Covilenl Modification of Guanine Bases in Double Stranded DNA: The 1.2 Angstroms Z-DNA Structure of 

d(CGCGCG) in the Presence of Caa2 

J.Biol.Chem.. 268, 2017.5-20184, 1991. 

S.Fujii. A.H.-J.Wang. G.A.Van Der Marel, J.H.Van Boom, A.Rich 

Molecular Structure of (m5dC-dG)3: The Role of the Methyl Group on 5-Methyl Cylosine in Stabilizing Z-DNA 

Sueletc AciJt Rtt., 10, 7879-7892, 1982. 



lt«»««« %f %tn N««I«U A«t4 0«t*bM« P»i«cl *• T«« A«( t« lt:04'0S tM» 



CeU Dimensions for Structures With the Sequence A T G C 



NDBID 



ADH032 
ADH033 
BDI.007 
BDtOlS 
PDT01» 
tJDG028 
ZDH018 



Ocscriptor/a b c Alpha Beta Gamma 



5--DCAP'TP'GP'CP'GP'CP'AP'T)-3', SPERMINE 

42.53 42.S3 24.52 90.00 90.00 90.00 

5--DCAP'TP'GP'CP'GP'CP'AP'T)-3', W/O SPERMINE 
42.41 42.41 34.90 90.00 90.00 90.00 

5-.DCCP'GP'CP'APTP'APTP«AP'TP'GP'CP'G)-3' 

23.54 38.85 86.5T 90.00 SO.OO 90.00 

5--DCCP'GP'CP'AP'AP'AP«AP'AP'TP'GP'CP'G)-3- 

24.S4 40.32 8S.86 90.00 90.00 90.00 

OCT-1 POU DOMAIN-DNA COMPLEX 

97.S0 89.80 80.00 90.00 90.00 90.00 

V-DCGP'CP'AP'TP'GP'CP»T)-3' 

22.S2 S9.37 24.35 90.00 90.00 90.00 

5--DCCP'GP'CP'AP'TP'GP'CP'G)-3' 
3090 30.90 43.14 90.00 90.00 120.00 



'•t. 1 I'MM \t i». n„M< A.M D>l^„ 



■ A«| It II 04:4« tt»» 



SpcGrp 



P 43 21 2 
P 43 21 2 
P 21 21 21 
P 21 21 21 
C 2221 
C222 - 
P65 



Coord 



Structures With a G-T MUmatch 



NDB ID 



ADHOIS 
ADH018 
ADH018 
ADHOIS 
ADHOIS 
ADH019 
BDL009 
BDLoog 
ZDF013 
ZDF013 



A Strand 



G-S 
T-4 
G-4 
T-5 
G-3 
T-« 
G-4 
T-9 
G-2 
T-S 



B Strand 



T-4 
G-S 
T-5 
G-4 
T-6 
G-J 
T-9 
G-4 
T-S 
G-: 



• ». >». «.,!_ ,.,j D.,.,. 



Descriptor 



-DCGP'GP'GP'TP'GP'CP'CP'C)":? 

'■DCGP'GP'GP*TP'GP'CP'CP«C)-3' 

D(«GP'GP«GP'GP'TP«CP'CP«C) 3' 
•-DCGP'GP'GP'GP»TP'CP'CP«C)-3' 
• DCGP'GP«GP'GP«CPTP«CP'C)-3' 

■DCGP'GP'GP'GP'CP'TP'CP'C) 3' 

■DCCP'GP'CP'GP'AP'AP'TP'TP'TP'GP'CPTVS- 

;-D(.CP.Gp.Cp.Gp.AP.Ap.Tp.Tp4p.Gp.Cp.GM- 
•-DCCP'GP'CP'GP'TP'G)-3' "^ ^'' l-P G)-3 

pCCP'GP'CP»GP'TP'G)-3' 
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Fig. 4. Examples of Postscript graphs created by NDBquery. (a) Histogram of the distribution of the C5'-C4' bond lengths in Z-DNA. (b) Pie chart 
showing the distribution of structure types in the NDB. (c) Scatterchart of i vs ^-torsion angles in successive residues of high resolution B-DNA 
stmaures. (d) Conformation wheel of the observed torsion angles in the Dickerson dodecamer, BDLOOl [11]. 
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Fig. 5. Examples of Poasaipt molecular graphics created by NDBquery for the self-complimentary duplex d(CGATCGATCG)2, BDJ025 [12]. 
(a) Ball and stick. (1)) Stcrcotriptych (13). (c) Four representative views, (d) Packing diagrams. 
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Fig. 6. Torsion angle wheels for the B-DNA structure BDLOOl [11]. The expected ranges for A-DNA, B-DNA and Z-DNA are shaded. In this 
example, all of the values for the torsion angles fall completely within the B-DNA range. 



5. Structure Validation 

5.1 Standard Dictionaries 

A major goal of the NDB Project is to develop and 
distribute methods to validate the structural features 
of nucleic acids. The first step in this process was to 
develop standard dictionaries of the valence geometry of 
oligonucleotides. Various dictionaries had been used by 
refinement programs, and it was felt that a new set of 
standard numbers should be derived. This was done 
by using very high resolution structures from the 
Cambridge Structural Database (CSD) [4]. Very accu- 
rate values were derived for the bases [5] and for the two 
standard conformations of the sugars [6]. The limited 
size of the small molecule sample made derivation of the 
phosphate geometry less satisfying. Nonetheless, it was 
possible to use these values and their standard deviation 
to develop force constants that could be used with 
X-PLOR [7]. Structures refined with these new values 
yielded much reduced rms deviations between the 
refined and the target geometries. 

The NDB uses these standard values for the valence 
geometry to check structures contained within the 
database. 

5.2 NDB Surveys 

There are now a sufficient number of structures 
contained in the NDB to be able to develop expected 
values of various structural parameters. Surveys have 
been done for all of the geometric properties, including 
bond distances, bond angles, and torsion angles [8]. The 
structures contained within the database had valence 
geometries which, for the most part, did not deviate 
from the small molecule results. Indeed, subtle features 
related to the differences in valence geometry between 



the C3' endo sugar pucker found in A-DNA and the 
C2' endo sugar conformation found in B-DNA were 
reflected in the survey. The only features that showed 
some differences between what was observed in the 
small molecule sample and the oligonucleotides were 
observed in the phosphodiester geometry. These effects 
may be very real and it is possible that in the future these 
values will be used to validate the phosphodiester 
geometry. 

The torsion angle survey [8] resulted in the first 
experimentally derived set of ranges of torsbn angles 
for this class of molecules. These values may be of great 
use in restrained refinement and in model building. The 
NDB has also created a "scoring system" that allows 
the conformation type of a DNA duplex to be assigned 
and checked against the assignment by the author 
(Fig. 6). 

6. Distribution 
6.1 World Wide Web 

The NDB is available electronically via the World 
Wide Web (http.V/ndbserver.rutgers.edu and http:// 
www.ebi.ac.uk/NDB/). In addition to providing direct 
query access through the forms based interface, the 
homepage (Fig. 7) offers access to a variety of other 
information. 

The NDB Archives maintain information about the 
NDB Project, which include the Project Newsletters and 
the NDBquery manual, as well as bibliographies of 
review articles and research articles that cite the NDB 
(Fig. 8). The Archives also furnish prepared reports 
about the structures in the database, including citations, 
structural features and cell dimensions. There are tables 
contained in the database of information about the 
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Rg. 7. The NDB Homepage (available at hltp://ndbserver.rutgers.edu and is mirrored at the European 
Bioinformatics Institute at http'.//www.ebi.ac.uk/NDB/). 
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Fig. 8. The NDB Archives Page. 
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Fig. 9. An Atlas entry for the first B-DNA crystal structure, BDLOOl [13]. (a) The top of the entry page shows the structure type, compound name, sequence, citation, and space group, (b) Also included 
in the atlas entry are cell constants, crystallization conditions, refinement, and a link to the coordinate file for the structure. A ball and stick representation of the structure is color coded by sequence, 
with thymine in blue, adenine in red, cytosine in yellow, and guanine in green, (c) The space filling and ribbon representations of the unit cell are color coded in terms of the symmetry related molecules. 
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various subcategories of structures, including DNA, 
RNA, nucleic acid-protein complexes, and nucleic 
acid-drug complexes. Nucleic Acid Dictionaries are 
included in the NDB Archives, and feature X-PLOR 
parameters and ideal geometries for DNA/RNA bases 
and sugar phosphates [5-7]. The archives are updated 
frequently, and can be accessed via the WWW or by 
anonymous ftp. 

Another feature of the NDB WWW site is the Atlas 
of Selected NDB Structures (Fig. 9). Each Atlas entry 
highlights the bibliographical, structural and experimen- 
tal information about each structure, as well as provid- 
ing pictures from different views and a link to the co- 
ordinate file for the structure. 

Also included on the NDB home page is the docu- 
mentation for both the Dictbnary Description Language 
for Macromolecular Structure (DDL) [9] and the 
Macromolecular Crystallographic Information File 
(mmCIF) [10] For more information on the NDB 
Project and other related sites, the General Information 
page provides a brief summary of the information in this 
article and useful links to other sites. 

6.2 Newsletter 

Published four times a year, the NDB Project 
Newsletter provides a list of recently released structures 
and any updates on the project itself. To subscribe, a 
message should be sent to ndblib@ndbserver.rutgers.edu 
with the subject "subscribe." 

6.3 Custom Queries 

Specialized and custom queries that are unavail- 
able through the forms based interface on the 
WWW may be requested by sending mail to 
ndbadmin@ndbserver.rutgers.edu. 

These requests can be for tabular reports containing 
the derived quantities available in the database, such as 
bond lengths, valence angles, torsion angles, or base 
morphology parameters. Molecular graphics, including 
packing pictures, m^ also be requested. 

7. Future 

The NDB will continue to develop and expand its 
scope. Most notable will be the full integration of 
mmCIF into all aspects of data processing. The 
NDB plans to provide more resource materials to 
researchers in the field, as well as to casual "surfers" 
who m^ want to learn more about nucleic acid 
structure. 



7.1 Data Processing 

The NDB Project has served as a test bed for the 
method of data description embodied in the Macro- 
molecular Crystallographic Information File (mmCIF) 
and has employed mmCIF as an interchange format 
using a locally developed dictionary. At each stage in the 
evolution of the mmCIF dictionary, software tools have 
been developed by the NDB to evaluate the extent to 
which each dictionary would facilitate the automated 
processing of data. The result of this development is the 
collection of software tools called SIFLIB (Fig. 10). 




Fig. 10. Functional diagram of SlFLlB illustrating the interaction of 
this utility library with a variety of other applications. The figure 
highlights the role of SIFLIB in encapsulating access to CIF Format 
data and dictionaries from calling applications. 



SIFLIB is a class library of tools which were de- 
signed to encapsulate operations on CIF format files and 
dictionaries. We have chosen to name this library using 
the more general terminology Structure Information File 
(SIF) to emphasize that these tools could be used with 
dictionaries for experimental techniques other than 
crystallography (e.g., NMR). SIFLIB was developed in 
conjunctbn with the Dictionary Description Language 
(DDL) Version 2.1 [9] on which the mmCIF dictbnary 
is based. Some of the functions performed by SIFLIB 
include: reading and writing CIF format data files and 
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dictionaries; reading and writing individual CIF data 
items; data integrity checking of CIF data items; and 
navigation through the CIF schema. 

As the first version of the mmCIF dictionary nears 
completion, the NDB is converting its data processing 
system based on the mmCIF local dictionary to a system 
which is based on the data representation in the mmCIF 
dictionary. The core of this conversion is the integration 
of SIFLIB into the NDB data processing scheme as 
shown in Fig. 10. The key feature of this new data 
proce.ssing scheme is that it takes full advantage of the 
data description provided by the mmCIF dictionary 
which now contains all of the information necessary 
to perform detailed integrity checks for individual 
data items as well as for the relationships between data 
items. 




♦ 




SQL 
Generator 


'**•' 


RDBMS 



Fig. 11. Schematic view of tlie NDB WWW forms based interface. 
The WWW version of NDBquery is called by the WWW server and 
provides the server with a description of the contents of the NDB 
dalabase, which is presented as a set of menu selections. The WWW 
interface also manages the construction of SQL queries and all com- 
munication with the NDB database. 



7.2 Validation 

As a result of the surveys of both the NDB and CSD 
databases, dict'ionaries of standard covalent geometries 
and observed ranges of other structural features have 
been compiled. 

These dictionaries provide the foundation for the con- 
tinued development of structural validation tools that 
will be used as benchmarks to evaluate each structure 
submitted to the NDB. 

mmClF provides a mechanism for standardizing the 
cncixiing of structural standards and other lengthy 
tabuiatwns reference data in External Reference 
Files (ERFs). Information stored in ERFs can be 
accessed using the same software (SIFLIB) as other 
GIF data. We plan to integrate structural ERFs 
automatically into the NDB data processing scheme 
(Fig. 10). 

73 Information Retrie%Til 

The recently developed WWW interface to the NDB 
database provides the structure selection features of the 
more robust menu-driven interface. NDBquery. An 
enhanced %Trsion of the WWW interface that will 
prwiile both structure selection as well as report gener- 
ation has recently been released. 

The WWW interface is shown schematically in 
Fig. II. The figure highlights the underlying use of a 
CIF dictbnary to describe the dalabase schema for the 
WWW interface. 
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